Skip to content

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560

Open
yeswanthk-26 wants to merge 9 commits into
mainfrom
yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg
Open

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560
yeswanthk-26 wants to merge 9 commits into
mainfrom
yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg

Conversation

@yeswanthk-26
Copy link
Copy Markdown
Collaborator

@yeswanthk-26 yeswanthk-26 commented May 23, 2026

Summary

  • Add config entries in .github/configs/nvidia-master.yaml for:
    • minimaxm2.5-fp4-gb300-dynamo-vllm
    • minimaxm2.5-fp4-b300-dynamo-vllm
  • Add recipe sets under:
    • benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/
    • benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/
  • Wire runner logic in:
    • runners/launch_gb300-nv.sh
    • runners/launch_b300-nv.sh
  • Update perf-changelog.yaml with entries for both new configs.

Note

Low Risk
Changes are benchmark/CI YAML and launch scripts only; no production serving or auth paths. Runner depends on an external srt-slurm fork branch for new schema fields.

Overview
Adds disaggregated multinode Dynamo + vLLM benchmark coverage for MiniMax-M2.5 NVFP4 on GB300 and B300, using vllm/vllm-openai:v0.20.1 and nvidia/MiniMax-M2.5-NVFP4.

.github/configs/nvidia-master.yaml gains minimaxm2.5-fp4-gb300-dynamo-vllm and minimaxm2.5-fp4-b300-dynamo-vllm: multinode, disagg prefill/decode search spaces for 1k/1k and 8k/1k ISL/OSL, each point pointing at a CONFIG_FILE under recipes/vllm/minimax-m2.5/.... B300 largely mirrors GB300 but adds tp8-1p1d at low concurrency for both sequence lengths.

New Slurm recipe trees under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/ (GB300) and minimax-m2.5-b300/ (B300): Dynamo frontend, Nixl KV transfer, TP/EP/DP decode variants, and sa-bench concurrency sweeps.

runners/launch_gb300-nv.sh and launch_b300-nv.sh route minimaxm2.5 + fp4 + dynamo-vllm to the model path/prefix, clone jasonlizhengjian/srt-slurm @ lijas/spread-workers, and copy the matching recipe dir into recipes/vllm/minimax-m2.5. GB300 also hardens import_squash when enroot leaves symlink locks/squash files (ELOOP).

perf-changelog.yaml documents both new config keys.

Reviewed by Cursor Bugbot for commit bea3877. Bugbot is set up for automated code reviews on this repo. Configure here.

Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.
Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.
Comment on lines +3 to +6
# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).
# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node
# GPU count makes available. Only the smallest concurrencies (1,4,8) —
# this topology is decode-latency focused, not throughput.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The tp8-1p1d.yaml header comments claim 'Only the smallest concurrencies (1,4,8)' (1k1k) / 'Smallest concurrencies only (1,4,8)' (8k1k), but both files set concurrencies: "4" — only 4, not 1/4/8. The nvidia-master.yaml entries for tp8-1p1d list conc-list [4] for both ISLs, confirming the recipes are authoritative and the comments are stale. Either update the comments to say 'only conc 4' or expand the concurrencies field to actually include 1 and 8.

Extended reasoning...

What the bug is\n\nTwo new B300-only recipe files document a sweep over three concurrencies in their header comments but only run a single concurrency in the actual benchmark block.\n\nIn benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml (lines 3–6):\n\nyaml\n# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).\n# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node\n# GPU count makes available. Only the smallest concurrencies (1,4,8) —\n# this topology is decode-latency focused, not throughput.\n\n\nBut the benchmark block at the bottom is:\n\nyaml\nbenchmark:\n type: "sa-bench"\n isl: 1024\n osl: 1024\n concurrencies: "4"\n random_range_ratio: 0.8\n\n\nThe 8k1k counterpart (benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/8k1k/tp8-1p1d.yaml) has the same drift — its header says "Smallest concurrencies only (1,4,8)" while concurrencies: "4".\n\nWhich side is authoritative\n\nCross-referencing .github/configs/nvidia-master.yaml, the tp8-1p1d search-space entries for both ISLs only list conc-list: [4]:\n\nyaml\n# 1k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/1k1k/tp8-1p1d.yaml"\n# 8k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/8k1k/tp8-1p1d.yaml"\n\n\nSo the recipe concurrencies: "4" and the master conc-list: [4] agree, and the header comments are the outlier. The driver only iterates over what's in those two fields; the comment has no runtime effect.\n\nStep-by-step proof\n\n1. Open 1k1k/tp8-1p1d.yaml. Comment block (lines 3–6) says "Only the smallest concurrencies (1,4,8)".\n2. Scroll to line 75 — concurrencies: "4". The driver parses this "x"-delimited string, yielding the single value [4].\n3. .github/configs/nvidia-master.yaml has conc-list: [4] for this recipe — matches the recipe, not the comment.\n4. Same pattern repeats in 8k1k/tp8-1p1d.yaml.\n\nWhy existing code doesn't prevent it\n\nThere is no validator that checks free-form comments against the actual concurrencies field, so the drift slipped through.\n\nImpact\n\nNone at runtime — benchmarks run exactly the conc set defined in concurrencies and conc-list. The only consequence is that a future reader (or copy-paste donor) of these recipes will be misled into believing tp8-1p1d covers conc 1 and 8 in addition to 4. That's a minor doc-hygiene issue, not a correctness bug.\n\nHow to fix\n\nPick one of:\n\n- Update the header comments to # Only conc 4 — this topology is decode-latency focused, not throughput. (and the equivalent on the 8k1k file), which matches the current concurrencies field and the master conc-list, or\n- Expand the recipes to concurrencies: "1x4x8" and the master conc-list: [1, 4, 8] if covering 1 and 8 was the actual intent.

Comment thread perf-changelog.yaml Outdated
Comment on lines +3125 to +3133
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

- config-keys:
- minimaxm2.5-fp4-gb300-dynamo-vllm
description:
- "Reactivate GB300 minimax disagg vLLM sweep with the pruned pareto-only search space from internal PR57"
- "Add minimax model routing and minimax srt-slurm fork path in runners/launch_gb300-nv.sh while keeping SA upstream defaults (SLURM_PARTITION=batch_1, SLURM_ACCOUNT=benchmark, SQUASH_FILE under /home/sa-shared/gharunners/squash/)"
- "Add 1k1k/8k1k minimax recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Both new perf-changelog.yaml entries (lines 3125 and 3133) ship with placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX URLs that will 404 once merged. The glm5 entry directly above (line 3115) was correctly populated with pull/1514 — these two should likewise be updated to reference this PR (#1560) before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml gains two new changelog entries in this PR, one for minimaxm2.5-fp4-b300-dynamo-vllm and one for minimaxm2.5-fp4-gb300-dynamo-vllm. Both entries terminate with a literal placeholder URL:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

The XXXX is meant to be replaced with the actual PR number prior to merging. Per the PR metadata, this is PR #1560, so both placeholders should be replaced with pull/1560.

How it manifests

Once merged, anyone consuming the changelog (UI surfaces that render pr-link, scripts that crawl the file to attribute regressions, humans clicking through history) will follow these URLs to https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is a non-numeric pull number and GitHub will return 404. The two new entries effectively lose their provenance back to the PR that introduced them.

Why existing code doesn't prevent it

The changelog is a hand-edited YAML file with no schema validation in this PR or CI that I can see — placeholders like XXXX parse as valid strings, so YAML lint passes and the entry merges cleanly. The convention is enforced by reviewer attention only, and the glm5 entry immediately above (line 3115, pull/1514) demonstrates the expected post-fix state.

Proof / step-by-step

  1. Open perf-changelog.yaml at line 3115 — the prior glm5-fp4-gb300-dynamo-sglang entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1514, the real PR for that change.
  2. Continue to line 3125 — the new minimaxm2.5-fp4-b300-dynamo-vllm entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  3. Continue to line 3133 — the new minimaxm2.5-fp4-gb300-dynamo-vllm entry also has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  4. The PR header reports this is PR Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg #1560.
  5. Once merged, fetching https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX will not resolve to a valid pull request (404), so the changelog ships with two dead links.

Fix

Replace both XXXX occurrences with 1560 before merging:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1560

This is a documentation/metadata issue only — no runtime impact — but worth fixing so the changelog stays a reliable index back to the introducing PRs (the same way the glm5 entry just above already does).

Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.
@github-actions
Copy link
Copy Markdown
Contributor

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Drop the confirmed unstable conc=512 datapoint from the b300 minimax 8k1k dep4-4p1d recipe and matching nvidia-master search-space to keep curve data consistent.
@yeswanthk-26 yeswanthk-26 force-pushed the yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg branch from 14c4dcf to 6d9c522 Compare May 25, 2026 23:21
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Switch minimaxm2.5-fp4-gb300-dynamo-vllm from generic gb300 label to gb300-nv so jobs use launch_gb300-nv.sh and avoid unsupported gb300-cw routing.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a135a6a. Configure here.

Comment thread runners/launch_gb300-nv.sh
Comment thread runners/launch_gb300-nv.sh Outdated
Use /scratch/models/MiniMax-M2.5-NVFP4 in launch_gb300-nv.sh so gb300 srtslurm model path resolution matches the cluster's staged model layout.
Remove the login-node fast-path skip in import_squash so every existing squash is validated on compute nodes via unsquashfs before reuse.
@yeswanthk-26 yeswanthk-26 force-pushed the yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg branch from 9d18e00 to bea3877 Compare May 26, 2026 21:33
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants